Method for dynamic context scope selection in hybrid N-GRAMlanguage modeling

ABSTRACT

A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

This application is a continuation of co-pending U.S. patent applicationSer. No. 10/917,730, filed on Aug. 12, 2004, which is a continuation ofU.S. patent application Ser. No. 10/243,423, filed on Sep. 12, 2002, nowissued as U.S. Pat. No. 6,778,952, which is a continuation of U.S.patent application Ser. No. 09/523,070, filed on Mar. 10, 2000, nowissued as U.S. Pat. No. 6,477,488.

FIELD OF THE INVENTION

The present invention relates to language modeling systems and moreparticularly to dynamic selection of context scope in latent semanticanalysis (LSA) language modeling.

BACKGROUND OF THE INVENTION

In general, speech recognition is the process of converting acousticsignal into a linguistic message. In certain applications, for examplewhere a speech recognition processor serves as a user interface to adatabase query system, the resulting message may need to contain enoughinformation to reliably communicate a speaker's goal in accessing thedatabase. However, in an application such as automated dictation orcomputer data entry, it may be necessary that the resulting messagerepresents a verbatim transcription of a sequence of spoken words. Ineither case, an accurate statistical, or stochastic, language model isdesirable for successful recognition.

Stochastic language modeling places such a role in large vocabularyspeech recognition in which it is typically used to constrain theacoustic analysis, guide the search through various (partial) texthypothesis, and/or contribute to the determination of the finaltranscription. Statistical language models using both syntactic andsemantic information have been developed. This approach embeds latentsemantic analysis (LSA), which is used to capture meaningful wordassociations in the available context of a document, into standardn-gram paradigm, which relies on the probability of occurrence in thelanguage of all possible strings of N words.

This new class of language models, referred to as (multi-span) hybridN-gram plus LSA models have shown a substantial reduction in perplexity.In addition, multi-span models have also been shown to significantlyreduce word error rate. However, their overall performance tends to besensitive to a number of factors. One such factor is the training dataused to derive the statistical parameters, in particular thoseassociated with the LSA component. This problem is common in statisticallanguage modeling and can be solved by careful matching or training andtest conditions. In addition, a dynamic selection of the LSA contextscope during recognition affects the overall effectiveness of the hybridmodels.

SUMMARY OF THE INVENTION

A method and system for dynamic language modeling of a document aredescribed. In one embodiment, a number of local probabilities of acurrent document are computed and a vector representation of the currentdocument in a latent semantic analysis (LSA) space is determined. Inaddition, a number of global probabilities based upon the vectorrepresentation of the current document in an LSA space is computed.Further, the local probabilities and the global probabilities arecombined to produce the language modeling.

BRIEF DESCRIPTION OF THE DRAWINGS

Features and advantages of the present invention will be apparent to oneskilled in the art in light of the following detailed description inwhich:

FIG. 1 is a block diagram of one embodiment for a hybrid speechrecognition system;

FIG. 2 is a block diagram of one embodiment for a computer systemarchitecture of a hybrid speech recognition system;

FIG. 3 is a block diagram of one embodiment for a computer system memoryof FIG. 2;

FIG. 4 is a block diagram of one embodiment for a text co-occurrencematrix of FIG. 3;

FIG. 5 is a block diagram of one embodiment for singular valuedecomposition matrices of the co-occurrence matrix of FIG. 4; and

FIG. 6 is a flow diagram of one embodiment for dynamic selection ofcontext scope in latent semantic analysis (LSA) language modeling.

DETAILED DESCRIPTION

A method and system for dynamic language modeling of a document aredescribed. In one embodiment, a number of local probabilities of acurrent document are computed and a vector representation of the currentdocument in a latent semantic analysis (LSA) space is determined. Inaddition, a number of global probabilities based upon the vectorrepresentation of the current document in an LSA space is computed.Further, the local probabilities and the global probabilities arecombined to produce the language modeling.

Reference in the specification to “one embodiment” or “an embodiment”means that a particular feature, structure, or characteristic describedin connection with the embodiment is included in at least one embodimentof the invention. The appearances of the phrase “in one embodiment” invarious places in the specification are not necessarily all referring tothe same embodiment.

Some portions of the detailed description that follows are presented interms of algorithms and symbolic representations of operations on databits within a computer memory in the form of a computer program. Such acomputer program may be stored in a computer readable storage medium,such as, but is not limited to, any type of disk including floppy disks,optical disks, CD-ROMs, and magnetic-optical disks, read-only memories(ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic oroptical cards, or any type of media suitable for storing electronicinstructions, and each coupled to a computer system bus.

FIG. 1 is a block diagram of one embodiment for a hybrid speechrecognition system 100. Referring to FIG. 1, an input signal 101 isreceived by transducer 102. The transducer is connected to a signalpreprocessor 120. In one embodiment, signal preprocessor includes analogto digital (A-D) converter 140 and feature extractor 104. Acousticsignal 101 is input to transducer 102 and the output of the transducer102 is coupled to an input of the AD converter 140.

Output from transducer 120 is sent to hybrid training-recognitionprocessor 108. Hybrid training/recognition processor 108 performs speechrecognition using a hybrid language model 130 which combines local andglobal language constraints to realize both syntactic and semanticmodeling benefits. Hybrid training recognition processor 108 usesacoustic models 110 and a Lexicon 112 to evaluate the feature vectorsoutput by the feature extractor 104. In general, the Lexicon 112 definesthe vocabulary of the recognition system 100 and terms the basic speechelements (words) and a language model 130 defines allowable sequence ofvocabulary items. Hybrid training recognition processor 108 combines Ngram syntactic analysis with latent semantic analysis (LSA) as describedin U.S. Pat. No. 5,839,106, entitled “Large-Vocabulary SpeechRecognition Using an Integrated Syntactic and Semantic StatisticalLanguage Model”, which is incorporated herein by reference. Hybridtraining/recognition processor 108 outputs a word sequence output 114.

FIG. 2 is a block diagram of one embodiment for a computer systemarchitecture for a hybrid speech recognition system. Referring to FIG.2, computer system 200 includes processor 202, digital signal processor208, memory 204, and mass storage device 207 connected via system bus201. System bus 201 is also coupled to received inputs from a keyboard222, pointing device 223, speech signal input device 225. In addition,system bus 201 provides outputs to display device 221 and hard copydevice 224.

FIG. 3 is a block diagram of one embodiment for a computer system memoryof computer system 200. Referring to FIG. 3, input device 302 providesspeech signals to a digitizer 304. Digitizer 304, or feature extractor,samples and digitizes the speech signals for further processing.Digitizer 304 may include storage of the digitized speech signals in thespeech input data memory component of memory 310 via system bus 308.Digitized speech signals are processed by digital processor 306 usingalgorithms and data stored in the components of memory 310.

In one embodiment, digitizer 304 extracts spectral feature vectors every10 milliseconds. In addition, short term Fast Fourier Transform followedby a Filter Bank Analysis are used to ensure a smooth spectral envelopeof the input spectral features. The first and second order regressioncoefficients of the spectral features are extracted. The first andsecond order regression coefficients, typically referred to as delta anddelta-delta parameters, are concatenated to create training set 312.During the training phase, a collection T of N articles is input intotraining set 312. Training set 312 contain a number of words thatconstitutes a vocabulary V. In one embodiment, the occurrences of eachword v_(i) in V is counted and saved as text co-occurrence matrix 314which is an M×N matrix, W, which represents the co-occurrences betweenwords in V and documents in T. In one embodiment, a singular valuedecomposition (SVD) of the matrix W is computed. The computation is asfollows:W≈W′=USV ^(T)  (1)in which U is the M×R matrix of left singular vectors, u_(i) (1≦i≦M), Sis the (R×R) diagonal matrix of singular values s_(r) (1≦r≦R), V is the(N×R) matrix of right singular vectors v_(i) (1≦i≦N), R<<M, N is theorder of the decomposition, and T denotes matrix transposition. The LSAmethod uses the SVD to define a mapping between the discrete sets M andT, and the continuous vector space S spanned by U and V. As a result,each word w_(i) in M is represented by a vector u_(i) in S, and eachdocument d_(j) in T is represented by a vector v_(j) in S. This mappingmakes it possible to compute the following LSA language modelprobability:Pr(w _(q) |H _(q−1))=Pr(w _(q) |{tilde over (d)} _(q−1)),  (2)where w_(q) is the current word and H_(q−1) is the associated historyfor this word, i.e., the current document so far (also referred to asthe current pseudo-document). This is done in three steps: (i) constructsparse representations w_(q) and {tilde over (d)}_(q−1) for the currentword and pseudo-document, (ii) map these quantities to vectors u_(q) and{tilde over (v)}_(q−1)in the space S, and (iii) uses a suitable measurein S to evaluate the closeness between u_(q) and {tilde over (v)}_(q−1).

This in turn leads to the hybrid n-gram+LSA language model probability:$\begin{matrix}{{{\Pr\left( w_{q} \middle| {\overset{\sim}{H}}_{q - 1} \right)} = \frac{{\Pr\left( w_{q} \middle| {w_{q - 1}W_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{q} \right)}}{\sum\limits_{w_{i} \in V}{{\Pr\left( w_{i} \middle| {w_{q - 1}w_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{i} \right)}}}},} & (3)\end{matrix}$where {tilde over (H)}_(q−1) is the overall available history(comprising an n-gram component as well as the LSA component mentionedabove).

Memory 310 also includes input text 320 for the storage of utterances torecognize. During recognition, the speaker inputs utterances forprocessing. In a method similar to that discussed above, the recognitioninput is normalized and stored into input text 320.

FIG. 4 is a block diagram of one embodiment for text co-occurrencematrix 314 which is a matrix of M words 420 of dimension N documents404. In one embodiment, M=20,000 words and N=500,000 documents.

FIG. 5 is a block diagram of one embodiment for decomposition matrices316. In one embodiment, singular value decomposition (SVD) of the matrixW is performed. The decomposition is as follows:W≈W′=USV ^(T)  (4)in which U is the M×R matrix of left singular vectors, u_(i) (1≦i≦M), Sis the (R×R) diagonal matrix of singular values s_(r) (1≦r≦R), V is the(N×R) matrix of right singular vectors v_(i) (1≦i≦N), R<<M, N is theorder of the decomposition, and T denotes matrix transposition. The LSAmethod uses the SVD to define a mapping between the discrete sets M andT, and the continuous vector space S spanned by U and V. As a result,each word w_(i) in M is represented by a vector u_(i) in S, and eachdocument d_(i) in T is represented by a vector v_(i) in S.

FIG. 6 is a flow diagram of one embodiment for dynamic selection ofcontext scope in latent semantic analysis (LSA) language modeling.Initially at processing block 605, the n-gram (local) probabilities forthe current word are computed. The n-gram probabilities define thelikelihood that a particular word within the system vocabulary (definedby Lexicon 112) will occur immediately following a string of n−1 wordswhich are also within the system vocabulary. Language model 130 providesthat for each word w_(q) in an available vocabulary V, a conditionalprobability Pr(w_(q)|H_(q) ^((T))) that the word w_(q) will occur givena local context, or history, H_(q), consisting of a string of n−1 wordsw_(q−1), w_(q−2), . . . w_(q−n+1), as follows:Pr(w _(q) |H _(q) ^((t)))=Pr(w _(q) |w _(q−1) W _(q−2) . . . w_(q−n+1)).  (5)

Given a set of probabilities defined in accordance with equation (5),the recognition processor 160 can search for, and assess the likelihoodof, various text hypotheses in producing the output message. TheProbabilities Pr(w_(q)|H_(q) ^((T))) may be estimated during a trainingphase using existing text databases. For example, the Linguistic DataConsortium sponsored by the Advanced Research Project Agency (ARPA)provides a wide range of application-specific databases which can beused for training purposes.

At processing block 610, a vector representation of the current documentin a latent semantic analysis (LSA) space is determined.

In one embodiment, all utterances spoken since the beginning of thesession are part of the current document. It can be shown from (1), thatthis approach corresponds to the following closed form for thepseudo-document {tilde over (d)}_(q), at time q, in which the vectorrepresentation of the current document in the LSA space, {tilde over(v)}_(q), is computed as follows: $\begin{matrix}{{\overset{\sim}{v}}_{q} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}{S^{- 1}.}}}}} & (6)\end{matrix}$in which n_(q) is the total number of words present in thepseudo-document at time q, I_(—p) is the index of the word observed attime p, and ε_(i) _(p) is the normalized entropy of this word in thecorpus T. However, this embodiment is only adequate if the user starts anew session each time s/he wants to work on a new document. If the userneeds to dictate in an heterogeneous manner, this solution will fail,because the pseudo-document {tilde over (d)}_(q) built under thisembodiment will not be sufficiently representative of each individualdocument.

In an alternate embodiment, the size of the history considered islimited, so as to avoid relying on old, possibly obsolete fragments toconstruct the current context. The size limit could be expressed inanything from words to paragraphs. If, for example, only the last Pwords are assumed to belong to the current document, this approachcorresponds to computing the latest pseudo-document vector using atruncated version of (5), as follows: $\begin{matrix}{{\overset{\sim}{v}}_{q} = {\frac{1}{P}{\sum\limits_{p = {q - P + 1}}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}S^{- 1}}}}} & (7)\end{matrix}$The constant P is highly dependent on the kind of documents spoken bythe user.

In an alternate embodiment, it is possible to adopt an intermediatesolution, which allows for some discounting of old data withoutrequiring a hard decision of the size of the caching window. In thisembodiment, exponential forgetting is used to progressively discountolder utterances. Assuming 0<λ≦1, this approach corresponds to theclosed form solution given by: $\begin{matrix}{\overset{\sim}{v} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{{\lambda^{({{nq} - {np}})}\left( {1 - ɛ_{i_{p}}} \right)}u_{i_{p}}{S^{- 1}.}}}}} & (8)\end{matrix}$where the gap between λ and 1 tracks the expected heterogeneity of thesession. In addition, a hard limit may be concurrently placed on thesize of the history as in (6).

At processing block 615, the LSA (global) probabilities based upon thevector representation of the current document in a latent semanticanalysis (LSA) space is computed as described in reference to FIG. 3.

At processing block 620, the n-gram and LSA probabilities are combined.The hybrid n-gram+LSA language model probability is computed as follows:$\begin{matrix}{{\Pr\left( w_{q} \middle| {\overset{\sim}{H}}_{q - 1} \right)} = \frac{{\Pr\left( w_{q} \middle| {w_{q - 1}w_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{q} \right)}}{\sum\limits_{w_{i} \in V}{{\Pr\left( w_{i} \middle| {w_{q - 1}w_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{i} \right)}}}} & (11)\end{matrix}$where {tilde over (H)}_(q−1) is the overall available history(comprising an n-gram component as well as the LSA component mentionedabove).

Preliminary experiments were conducted on a subset of the Wall StreetJournal 20,000 word-vocabulary, continuous speech task. The acoustictraining corpus consisted of 7,200 sentences of data uttered by 84different native speakers of English. The language model training corpuswas the ARPA North American Business News corpus, as previouslydescribed in U.S. Pat. No. 5,839,106, herein incorporated by reference.All experiments were performed using the hybrid bigram+LSA languagemodel obtained in U.S. Pat. No. 5,839,106.

This system was tested on 12 additional native speakers of English, whouttered a total of 496 test sentences. This test corpus was constructedwith no more than 3 or 4 consecutive sentences extracted from a singlearticle. Overall, the corpus comprised 140 distinct document fragmentswhich means that each speaker spoke, on the average, about 12 different“mini-documents.” As a result, the context effectively changed every 60words or so. This is a situation where appropriately forgetting thecontext is the key to avoid relying on an obsolete representation.

We performed dynamic context scope selection using the exponentialforgetting framework described above. The value of the parameter λvaried from λ=1 (unbounded context) to λ=0.95 (restrictive context) indecrements of 0.01. The results are presented in Table 1. It can be seenthat performance improves substantially with a modicum of exponentialforgetting (0.97<λ<1), as the pseudo-document representation becomesless and less contaminated with obsolete data. However, if forgetting istoo aggressive (here, for λ<0.97), the performance starts degrading,because the effective context no longer has a length appropriate to thetask at hand. Speaker λ = 1.0 λ = 0.99 λ = 0.98 λ = 0.97 λ = 0.96 λ =0.95 001  7.7% 11.9% 11.2%  4.9%  −2.1%    −3.5%   002 27.7% 33.3% 33.9%35.0% 37.9% 36.2% 00a 15.7% 25.2% 21.2% 25.9% 23.0% 20.8% 00b  8.2% 9.7%  7.8%  9.7%  7.8%  7.8% 00c 10.3% 12.9% 17.6% 16.5% 16.5% 16.2%00d 16.1% 27.8% 33.6% 35.4% 39.2% 33.0% 00f 10.7% 11.1% 15.3% 16.9%16.5% 16.9% 203 15.4% 21.5% 32.2% 34.2% 33.6% 28.9% 400 15.9% 17.0%18.1% 19.8% 19.2% 16.5% 430 12.6% 19.3% 20.2% 17.6% 15.4% 10.9% 431 8.9% 15.0% 18.3% 18.3% 17.8% 13.6% 432 11.2% 16.2% 23.5% 27.9% 27.9%26.3% Overall 13.2% 18.4% 21.1% 21.9% 21.6% 19.3%

The specific arrangements and methods herein are merely illustrative ofthe principles of this invention. Numerous modifications in form anddetail may be made by those skilled in the art without departing fromthe true spirit and scope of the invention.

1. A method of language modeling of a document comprising: computing aplurality of local probabilities of a current document; determining avector representation of the current document in a latent semanticanalysis (LSA) space wherein the scope of the current document isadjustable; computing a plurality of global probabilities based upon thevector representation of the current document in an LSA space; andcombining the local probabilities and the global probabilities toproduce a language modeling.
 2. The method of claim 1 wherein theplurality of local probabilities is based upon an n-gram paradigm. 3.The method of claim 1 wherein the plurality of local probabilitiesPr(w_(q)|H_(q) ^((t))) for a particular word w_(q), drawn from avocabulary V comprising a plurality of words w_(i), given a hybridcontextual history H_(q) ^((T)) of n−1 words w_(q−1), w_(q−2), . . .w_(q−n+1), as:Pr(w _(q) |H _(q) ^((t)))=Pr(w _(q) |w _(q−1) W _(q−2) . . . w_(q−n+1)).
 4. The method of claim 1 wherein the vector representation ofthe current document in an LSA space is generated from at least onedecomposition matrix of a singular value decomposition of aco-occurrence matrix, W, between M words in a vocabulary V and Ndocuments in a text corpus T.
 5. The method of claim 4 wherein thevector representation of the current document in an LSA space is basedupon all words from a beginning of a session.
 6. The method of claim 5wherein the vector representation of the current document in an LSAspace, v_(q), at time q, wherein n_(q) is the total number of words inthe current document, i_(p) is the index of the word observed at time p,ε_(i) _(p) is the normalized entropy of the word observed at time pwithin a text T, μ_(i) _(p) is the left singular vector at time p of thesingular value decomposition of W, and S is the diagonal matrix ofsingular values of the singular value decomposition of W, as:${\overset{\sim}{v}}_{q} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}{S^{- 1}.}}}}$7. The method of claim 4 wherein the vector representation of thecurrent document in an LSA space is based upon a plurality of temporallyadjacent words.
 8. The method of claim 7 wherein the vectorrepresentation of the current document in an LSA space, v_(q), at timeq, wherein n_(q) is the total number of words in the current document,i_(p) is the index of the word observed at time p, ε_(i) _(p) is thenormalized entropy of the word observed at time p within a text T, P isthe number of temporally adjacent words up to the current word, μ_(i)_(p) is the left singular vector at time p of the singular valuedecomposition of W, and S is the diagonal matrix of singular values ofthe singular value decomposition of W, as:${\overset{\sim}{v}}_{q} = {\frac{1}{P}{\sum\limits_{p = {q - P + 1}}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}{S^{- 1}.}}}}$9. The method of claim 4 wherein the vector representation of thecurrent document in an LSA space is based upon a plurality ofexponentially weighted temporally adjacent words.
 10. The method ofclaim 9 wherein the vector representation of the current document in anLSA space, v_(q), at time q, wherein n_(q) is the total number of wordsin the current document, i_(p) is the index of the word observed at timep, ε_(i) _(p) is the normalized entropy of the word observed at time pwithin a text T, 0<λ≦1, μ_(i) _(p) is the left singular vector at time pof the singular value decomposition of W, and S is the diagonal matrixof singular values of the singular value decomposition of W, as:$\overset{\sim}{v} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{{\lambda^{({{nq} - {np}})}\left( {1 - ɛ_{i_{p}}} \right)}u_{i_{p}}{S^{- 1}.}}}}$11. The method of claim 1 wherein the plurality of global probabilitiesis based upon a latent semantic paradigm.
 12. The method of claim 1wherein the plurality of global probabilities Pr(w_(q)|H_(q−1)) for aparticular word w_(q), for an associated history of the word, H_(q−1),for the current document {tilde over (d)}_(q−1), as:Pr(w _(q) |H _(q−1))=Pr(w _(q) |{tilde over (d)} _(q−1)), based upon thevector representation of the current document in an LSA space.
 13. Themethod of claim 12 wherein combining the local probabilities and theglobal probabilities is computed as follows:${\Pr\left( w_{q} \middle| {\overset{\sim}{H}}_{q - 1} \right)} = \frac{{\Pr\left( w_{q} \middle| {w_{q - 1}W_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{q} \right)}}{\sum\limits_{w_{i} \in V}{{\Pr\left( w_{i} \middle| {w_{q - 1}w_{q - 2}\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{i} \right)}}}$14. A system for language modeling of a document comprising: means forcomputing a plurality of local probabilities of a current document;means for determining a vector representation of the current document ina latent semantic analysis (LSA) space wherein the scope of the currentdocument is adjustable; means for computing a plurality of globalprobabilities based upon the vector representation of the currentdocument in an LSA space; and means for combining the localprobabilities and the global probabilities to produce a languagemodeling.
 15. A computer readable medium comprising instructions, whichwhen executed on a processor, perform a method for language modeling ofa document, comprising: computing a plurality of local probabilities ofa current document; determining a vector representation of the currentdocument in a latent semantic analysis (LSA) space wherein the scope ofthe current document is adjustable; computing a plurality of globalprobabilities based upon the vector representation of the currentdocument in an LSA space; and combining the local probabilities and theglobal probabilities to produce a language modeling.
 16. A system forlanguage modeling of a document comprising: a hybridtraining/recognition processor configured to compute a plurality oflocal probabilities of a current document, determine a vectorrepresentation of the current document in a latent semantic analysis(LSA) space wherein the scope of the current document is adjustable,compute a plurality of global probabilities based upon the vectorrepresentation of the current document in an LSA space, and combine thelocal probabilities and the global probabilities to produce a languagemodeling.
 17. The system of claim 16 wherein the processor is furtherconfigured to generate the plurality of local probabilities based uponan n-gram paradigm.
 18. The system of claim 16 wherein the processor isfurther configured to generate the plurality of local probabilitiesPr(w_(q)|H_(q) ^((t))) for a particular word w_(q), drawn from avocabulary V comprising a plurality of words w_(i), given a hybridcontextual history H_(q) ^((T)) of n−1 words w_(q−1), w_(q−2), . . .w_(q−n+1), as:Pr(w _(q) |H _(q) ^((t)))=Pr(w _(q) |w _(q−1) W _(q−2) . . . w_(q−n+1)).
 19. The system of claim 16 wherein the processor is furtherconfigured to generate the vector representation of the current documentin an LSA space from at least one decomposition matrix of a singularvalue decomposition of a co-occurrence matrix, W, between M words in avocabulary V and N documents in a text corpus T.
 20. The system of claim19 wherein the processor is further configured to generate the vectorrepresentation of the current document in an LSA space based upon allwords from a beginning of a session.
 21. The system of claim 20 whereinthe processor is further configured to generate the vectorrepresentation of the current document in an LSA space, v_(q), at timeq, wherein n_(q) is the total number of words in the current document,i_(p) is the index of the word observed at time p, ε_(i) _(p) is thenormalized entropy of the word observed at time p within a text T, μ_(i)_(p) is the left singular vector at time p of the singular valuedecomposition of W, and S is the diagonal matrix of singular values ofthe singular value decomposition of W, as:${\overset{\sim}{v}}_{q} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}{S^{- 1}.}}}}$22. The system of claim 19 wherein the processor is further configuredto generate the vector representation of the current document in an LSAspace based upon a plurality of temporally adjacent words.
 23. Thesystem of claim 22 wherein the processor is further configured togenerate the vector representation of the current document in an LSAspace, v_(q), at time q, wherein n_(q) is the total number of words inthe current document, i_(p) is the index of the word observed at time p,ε_(i) _(p) is the normalized entropy of the word observed at time pwithin a text T, P is the number of temporally adjacent words up to thecurrent word, μ_(i) _(p) is the left singular vector at time p of thesingular value decomposition of W, and S is the diagonal matrix ofsingular values of the singular value decomposition of W, as:${\overset{\sim}{v}}_{q} = {\frac{1}{P}{\sum\limits_{p = {q - P + 1}}^{q}{\left( {1 - ɛ_{i_{p}}} \right)u_{i_{p}}{S^{- 1}.}}}}$24. The system of claim 16 wherein the processor is further configuredto generate the vector representation of the current document in an LSAspace based upon a plurality of exponentially weighted temporallyadjacent words.
 25. The system of claim 16 wherein the processor isfurther configured to generate the vector representation of the currentdocument in an LSA space, v_(q), at time q, wherein n_(q) is the totalnumber of words in the current document, i_(p) is the index of the wordobserved at time p, ε_(i) _(p) is the normalized entropy of the wordobserved at time p within a text T, 0<λ≦1, μ_(i) _(p) is the leftsingular vector at time p of the singular value decomposition of W, andS is the diagonal matrix of singular values of the singular valuedecomposition of W, as:$\overset{\sim}{v} = {\frac{1}{n_{q}}{\sum\limits_{p = 1}^{q}{{\lambda^{({{nq} - {np}})}\left( {1 - ɛ_{i_{p}}} \right)}u_{i_{p}}{S^{- 1}.}}}}$26. The system of claim 16 wherein the processor is further configuredto generate the plurality of global probabilities based upon a latentsemantic paradigm.
 27. The system of claim 16 wherein the processor isfurther configured to generate the plurality of global probabilitiesPr(w_(q)|H_(q−1)) for a particular word w_(q), for an associated historyof the word, H_(q−1), for the current document {tilde over (d)}_(q−1),as:Pr(w _(q) |H _(q−1))=Pr(w _(q) |{tilde over (d)} _(q−1)), based upon thevector representation of the current document in an LSA space.
 28. Thesystem of claim 27 wherein the processor is further configured tocombine the local probabilities and the global probabilities as follows:${\Pr\left( w_{q} \middle| {\overset{\sim}{H}}_{q - 1} \right)} = {\frac{{\Pr\left( w_{q} \middle| {w_{q - 1}w_{q - 2}\quad\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{q} \right)}}{\sum\limits_{w_{i} \in V}{{\Pr\left( w_{i} \middle| {w_{q - 1}w_{q - 2}\quad\ldots\quad w_{q - n + 1}} \right)}{\Pr\left( {\overset{\sim}{d}}_{q - 1} \middle| w_{i} \right)}}}.}$